Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenAI Ingredient Parsing #3581

Conversation

michael-genson
Copy link
Collaborator

@michael-genson michael-genson commented May 10, 2024

What type of PR is this?

(REQUIRED)

  • feature

What this PR does / why we need it:

(REQUIRED)

This PR opens the door to implementing OpenAI in Mealie, and implements a new OpenAI ingredient parser. At a high level, this adds an OpenAI service that manages stored prompts and data injection to call the OpenAI API and receive a JSON response (which we then parse into a Pydantic model).

To enable OpenAI features, users need to include their OpenAI API key in the backend config (using the OPENAI_API_KEY env var). There are a few other configuration options to tweak performance vs cost (since the API isn't free).

Since OpenAI configuration is done via environment variables, this doesn't require any DB migrations.


The way this works is we have stored prompts which get sent to OpenAI to instruct it on what to do, i.e. "You are a bot designed to parse ingredients for recipes" (the actual prompt is much longer and goes into far more detail). It then sends a JSON list of inputs as the user message for it to process.

The OpenAI API supports returning its response in JSON format, which is perfect for FastAPI/Pydantic validation. I used Pydantic's BaseModel.model_dump_json() to inject the expected response schema into the prompt, which makes GPT always respond in a parsable format.

From there, implementing an interface is simple:

  1. send inputs to OpenAI
  2. receive predictable JSON string
  3. parse into Pydantic model
  4. push through whatever existing service we have)

Our OpenAI service handles the prompt injection, additional data injection (see below), and API handling, you just need to provide it the data and a description of how to use the data.


For the parser I opted to serialize our unit store and send it along with the rest of the prompt. This gives GPT some training data to say "you should expect to see these units". Originally I also included foods, but it didn't seem to help much at all (and adding the entire food store racks up API costs. This is configurable in the env settings: if you want to reduce costs, you can skip the optional data injection.

The OpenAI API isn't very fast when the responses are long. I took a bunch of measures to optimize this, but you can also split the ingredients into chunks and send multiple async requests (one for each chunk). This speeds up the parse time considerably, but costs more. The worker count is configurable in the env settings.


This PR also adds some QoL features on the frontend for parsing ingredients. Namely:

  • The last parser you chose is stored in user settings (so if you like the OpenAI parser you don't have to keep clicking on it)
  • While the backend is parsing ingredients, parse/save are disabled, and a loading animation runs (since OpenAI parsing takes some time, this was necessary, although it's nice for the other parsers too)

I've also hid the OpenAI ingredient parser if OpenAI isn't enabled (i.e. you haven't provided an API key).

Which issue(s) this PR fixes:

(REQUIRED)

N/A, though it has been discussed on and off

Special notes for your reviewer:

(fill-in or delete this section)

The prompts (this one and future ones) will likely go through a bunch of iterations before we're in that "sweet spot" of how to get the best results out of GPT. Ideally in the future it will need to be optimized for newer models (we may even decide to have different prompts for different models), but this is why I specifically included an env var for the OpenAI model to use (so that we aren't forced to keep up with the rapidly evolving AI space), sort of like pinning a package version.

This opens up some exciting possibilities in the future, such as importing strange recipe sources (unstructured data, OCR, etc.).

Testing

(fill-in or delete this section)

You need an OpenAI API key to properly test this, but I added a mocked test just to confirm it works fine as long as we get data from OpenAI.

@michael-genson michael-genson force-pushed the feat/open-ai-ingredient-parsing branch from a302c75 to 5796c1f Compare May 10, 2024 22:12
@boc-the-git
Copy link
Collaborator

I've only skimmed it, this is not a definitive review!

Is it possible to make the endpoint configurable? I've not across all the details but I believe a lot of the self hosted LLM projects have "OpenAI compatible" endpoints. It would be great if we can easily support those as well, particularly given Mealie is in the the same self hosting space.

(Obviously, I'd be happy for any change here to be a subsequent PR)

@michael-genson
Copy link
Collaborator Author

It might be possible, but it would require a lot of extra work for a few reasons, which I've stayed away from in this PR:

  • OpenAI specifically supports a JSON response, which to my knowledge is unique to OpenAI, and we rely heavily on this
  • I used the OpenAI API library/client because it covers a lot of edge-cases, but it's tied to OpenAI. We'd have to switch to LangChain directly which would be a more significant lift, and would likely share very little code with the OpenAI implementation

@jaasonw
Copy link

jaasonw commented May 14, 2024

Can you elaborate on what you mean by "JSON response unique to OpenAI"?

I believe what boc is saying is projects like ollama have an OpenAI-compatible API, allowing it to act as a drop-in replacement to the endpoint in the OpenAI library

@michael-genson
Copy link
Collaborator Author

Can you elaborate on what you mean by "JSON response unique to OpenAI"?

Specifically OpenAI's JSON mode: https://platform.openai.com/docs/guides/text-generation/json-mode

projects like ollama have an OpenAI-compatible API

I didn't realize that works even with the OpenAI library, that's super nice. Looks like we can just make the OpenAI base URL customizable and enable this. What I wanted to avoid was writing a custom client to interact with OpenAI (since it's a lot to maintain and really out of scope of Mealie)

@eikaramba
Copy link

if i understand this correctly the input is still the meta data from a website in a open recipe format. right? because i am parsing a websites html with gpt as not every website has the recipe in a structured format. i stumbled across multiple examples where the recipe is only available in the text/html and chatgpt needs to intelligently parse it to a json. works very good actually

@michael-genson
Copy link
Collaborator Author

michael-genson commented May 15, 2024

the input is still the meta data from a website in a open recipe format

Correct, this PR is not for scraping websites and generating recipes. This is for recipes that have already been imported, but their ingredients are not yet parsed.

However, I do have plans to support alternative import methods using OpenAI, building off of the foundation of this PR. Theoretically we can fall back to parsing a website with OpenAI when recipe metadata isn't available.

@michael-genson
Copy link
Collaborator Author

There are a few different discussions that I think are great ways to apply this to other areas of Mealie later down the road

@felixschndr
Copy link
Contributor

Does this required a paid OpenAI account? The default model is gpt-4o, however when setting this to something like gpt the calls should be free, right?

@michael-genson
Copy link
Collaborator Author

michael-genson commented May 16, 2024

You may use any LLM that has an OpenAI-compatible API. For instance, see ollama posted above. You just need to specify your own base_url, api_key, and model name for your model.

I've only tested with gpt-4 (and its variants) so I can only confirm that those work, however it's fully configurable per-instance. I will say that with gpt-4 you blow through the free tier extremely quickly. I've built in some measures to reduce costs as well as give some configurability to trade off speed vs cost. With gpt-4o it seems to cost 5-10 cents per parsed recipe (with 2 workers and ~10 ingredients)

@jaasonw
Copy link

jaasonw commented May 16, 2024

With gpt-4o it seems to cost 5-10 cents per parsed recipe (with 2 workers and ~10 ingredients)

Is there a reason to prefer a more powerful and more expensive model than 3.5-turbo ($0.50/1M tokens) as the default?

@michael-genson
Copy link
Collaborator Author

Short answer: No not really, but the default hardly matters when it doesn't work out of the box anyway; at a minimum you need to supply an API key, so there's nothing stopping you from also setting the model.

Longer answer: I've had a lot more success with GPT-4 when it comes to anything other than conversational interaction. GPT-3.5 is also a lot more moody when it comes to following prompts. GPT-4 is also much better at parsing non-english languages, which is particularly important for a parser that needs to understand grammar

Copy link
Collaborator

@boc-the-git boc-the-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I particularly like the introduction of parserLoading.

Let's get this in front of people and see what feedback comes through!

@boc-the-git boc-the-git enabled auto-merge (squash) May 22, 2024 09:37
@boc-the-git boc-the-git merged commit 5c57b3d into mealie-recipes:mealie-next May 22, 2024
10 checks passed
@michael-genson michael-genson deleted the feat/open-ai-ingredient-parsing branch May 22, 2024 14:25
boc-the-git pushed a commit to boc-the-git/mealie that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants